Goto

Collaborating Authors

 redundant parameter




common concern on the lack of discussion on the limitations/possible extensions of our methods, which we discuss

Neural Information Processing Systems

We thank the reviewers for taking the time to carefully read the paper and their constructive comments. We think this might be feasible. T o Reviewer#1 Thank you for your detailed comments. Please also see the revision plan to Reviewer#2. We admit that the claimed "redundant parameters" problem of TMM is a bit artificial and NAG, TMM and G-TM (optimal tuning), and provide the guarantee of TMM (Eq.(11) in [7]) in Section 3.1; (ii) we will About the flawed guarantee, thanks for pointing out the intermediate inequality.


AlterMOMA: Fusion Redundancy Pruning for Camera-LiDAR Fusion Models with Alternative Modality Masking

Neural Information Processing Systems

Camera-LiDAR fusion models significantly enhance perception performance in autonomous driving. Moreover, in practice, camera-LiDAR fusion models utilize pre-trained backbones for efficient training. However, we argue that directly loading single-modal pre-trained camera and LiDAR backbones into camera-LiDAR fusion models introduces similar feature redundancy across modalities due to the nature of the fusion mechanism. Unfortunately, existing pruning methods are developed explicitly for single-modal models, and thus, they struggle to effectively identify these specific redundant parameters in camera-LiDAR fusion models. In this paper, to address the issue above on camera-LiDAR fusion models, we propose a novelty pruning framework Alternative Modality Masking Pruning (AlterMOMA), which employs alternative masking on each modality and identifies the redundant parameters.


The Importance of Being Parameters: An Intra-Distillation Method for Serious Gains

Xu, Haoran, Koehn, Philipp, Murray, Kenton

arXiv.org Artificial Intelligence

Recent model pruning methods have demonstrated the ability to remove redundant parameters without sacrificing model performance. Common methods remove redundant parameters according to the parameter sensitivity, a gradient-based measure reflecting the contribution of the parameters. In this paper, however, we argue that redundant parameters can be trained to make beneficial contributions. We first highlight the large sensitivity (contribution) gap among high-sensitivity and low-sensitivity parameters and show that the model generalization performance can be significantly improved after balancing the contribution of all parameters. Our goal is to balance the sensitivity of all parameters and encourage all of them to contribute equally. We propose a general task-agnostic method, namely intra-distillation, appended to the regular training loss to balance parameter sensitivity. Moreover, we also design a novel adaptive learning method to control the strength of intra-distillation loss for faster convergence. Our experiments show the strong effectiveness of our methods on machine translation, natural language understanding, and zero-shot cross-lingual transfer across up to 48 languages, e.g., a gain of 3.54 BLEU on average across 8 language pairs from the IWSLT'14 translation dataset.